Data types

Today we'll be taking a look at the standard data types that come with python and some examples of how we can use these data types to represent real world information. This list isn't comprehensive, there are many more types available but this is the majority of what you'll see day to day.

Why?

Data types are the building blocks of applications. They are the basic elements we can combine to form more complex structures.

Integers

Integers are whole numbers. They can be either positive or negative. Not much to say here, they do what you'd expect!



In [1]:

    
1









    Out[1]:





1



In [2]:

    
-5









    Out[2]:





-5



In [3]:

    
print 2 + 10 # addition
print 5 - 3 # subtraction
print 6 * 4 # multiplication
print 10 / 5 # division
print 2**4 # exponents

There is something I should mention here. python 2 can trip people up when trying to do something like the following:



In [4]:

    
2 / 3









    Out[4]:





0

This is because python defaults to returning an integer (rounded down) when you ask to divide two integers. Read more about this here: http://python-history.blogspot.com/2009/03/problem-with-integer-division.html

To get the answer you'd expect you'll need at least one of the numbers to be a float:



In [5]:

    
2 / 3.0









    Out[5]:





0.6666666666666666

Floating point numbers

Floating point numbers can be a bit tricky. Let's take a look at some examples:



In [6]:

    
1.5









    Out[6]:





1.5



In [7]:

    
type(1.5)









    Out[7]:





float

So you might think that floats are simply numbers that have decimal parts but...



In [8]:

    
0.1 + 0.2









    Out[8]:





0.30000000000000004

The python docs discuss this behavior: (https://docs.python.org/2/tutorial/floatingpoint.html#representation-error):

Note that this is in the very nature of binary floating-point: this is not a bug in Python, and it is not a bug in your code either. You’ll see the same kind of thing in all languages that support your hardware’s floating-point arithmetic (although some languages may not display the difference by default, or in all output modes).

For more info:

In short: if you are doing work which requires numerical precision you'll want to use the decimal library. Note that we have to pass the Decimal class a string as an argument.



In [9]:

    
from decimal import Decimal
Decimal('0.1') + Decimal('0.2')









    Out[9]:





Decimal('0.3')

Strings

Now we get to the fun stuff. I say we're getting into the fun stuff because there aren't a lot of methods for numerical types but the rest of the types we'll discuss have plenty of methods available to them which can be very useful.

Strings are text, generally. A string is any collection of symbols surrounded by quotes:



In [10]:

    
'Hello python learners'









    Out[10]:





'Hello python learners'

Strings can use single, double, and triple quotes:



In [11]:

    
print 'Hello'
print "there"
print '''python'''
print """learners!"""









    



Hello
there
python
learners!

It's useful that we can use all types of quotes as it allows us to have strings with quote's inside them.



In [12]:

    
print "This string contains single quotes but that's ok since it's surrounded by double quotes"
print 'This string is surrounded by single quotes. The cow says: "mooo"'
print '''This string want's to mix both "types" of quotes and that's ok since we surrounded it with triple quotes! '''









    



This string contains single quotes but that's ok since it's surrounded by double quotes
This string is surrounded by single quotes. The cow says: "mooo"
This string want's to mix both "types" of quotes and that's ok since we surrounded it with triple quotes!



In [13]:

    
"We can " + "concatenate strings " + "together using the + operator"









    Out[13]:





'We can concatenate strings together using the + operator'



In [14]:

    
first = "Sometimes it's better "
middle = "to assign parts of a long string "
last = "to variables then concatenate the strings by the variable names"
sentence = first + middle + last
print sentence









    



Sometimes it's better to assign parts of a long string to variables then concatenate the strings by the variable names



In [15]:

    
("And sometimes "
"we can split a string "
"on seperate lines and they will be "
"put together since they are surrounded by parentheses!")









    Out[15]:





'And sometimes we can split a string on seperate lines and they will be put together since they are surrounded by parentheses!'

One of the most common built-in functions you'll use is len(), as you might imagine it returns the length of the argument you pass it:



In [16]:

    
name = "Eve"
len(name)









    Out[16]:





3

These methods on strings allow us to modify and ask questions about a string.



In [17]:

    
for i in dir('Hello'):
    if not i.startswith('_'):
        print i









    



capitalize
center
count
decode
encode
endswith
expandtabs
find
format
index
isalnum
isalpha
isdigit
islower
isspace
istitle
isupper
join
ljust
lower
lstrip
partition
replace
rfind
rindex
rjust
rpartition
rsplit
rstrip
split
splitlines
startswith
strip
swapcase
title
translate
upper
zfill

Here are some examples of what we can do with these methods:



In [1]:

    
word = "hello"
print "capitalize:", word.capitalize() # capitalize the first letter of the string
print "count:", word.count('l') # count how many times the string we pass as an argument appear in 'word'
print "endswith:", word.endswith('o') # T/F if it ends with the string we pass as an argument
print "index:", word.index('o') # Returns index of the string we pass as an argument (remember indexes start at 0)
print "isalpha:", word.isalpha() # methods that start with 'is' give us a clue that the method returns True or False
print "upper:", word.upper() # changes all letters of the string to uppercase
word_two = "HeLlO"
print "swapcase:", word_two.swapcase() # for every letter in the string, swap between upper and lower case
name = "guido van rossum"
print "title:", name.title() # Assumes the string is a name and will change the first letter of each word to uppercase
sentence = "The quick brown fox"
print "split:", sentence.split() # Splits the string into individual words grouped into a list.









    



capitalize: Hello
count: 2
endswith: True
index: 4
isalpha: True
upper: HELLO
swapcase: hElLo
title: Guido Van Rossum
split: ['The', 'quick', 'brown', 'fox']

Lists

So far we've talked about data types that exist as singular objects. Now we can move on to data types that act as collections of items. The first we'll discuss is lists.

A list is an ordered series of things. A list can contain objects of any type, including other lists! We use square brackets [] around a comma seperated series of objects to define a list.



In [19]:

    
a = [1, 2, 3]
print a
print len(a)









    



[1, 2, 3]
3



In [20]:

    
b = [1, 'one', 1.0]
print b
print len(b)









    



[1, 'one', 1.0]
3



In [21]:

    
c = [[1, 2, 3], ['one', 'two', 'three'], [1.0, 2.0, 3.0]]
print c
print len(c)









    



[[1, 2, 3], ['one', 'two', 'three'], [1.0, 2.0, 3.0]]
3

Just like strings there are methods available to us to work with lists



In [22]:

    
for i in dir([]):
    if not i.startswith('_'):
        print i









    



append
count
extend
index
insert
pop
remove
reverse
sort

Let's take a look at how these work. We'll start off with a list of two names, alice and bob. From there we'll use each of the methods to modify the 'names' list.



In [23]:

    
names = ['alice', 'bob']
names









    Out[23]:





['alice', 'bob']

append() will add the argument to the end of the list



In [24]:

    
names.append('eve') 
names









    Out[24]:





['alice', 'bob', 'eve']

We'll append again to show off the next method



In [25]:

    
names.append('bob') 
names









    Out[25]:





['alice', 'bob', 'eve', 'bob']

count() tells us how many times the argument occurs in the list



In [26]:

    
print "The word 'bob' is seen:", names.count('bob')









    



The word 'bob' is seen: 2

append() only adds a single item at a time, if we want to extend our original list by several items we can use the extend() method and pass in a list of things to add to the end.



In [27]:

    
names.extend(['bill', 'sally']) 
names









    Out[27]:





['alice', 'bob', 'eve', 'bob', 'bill', 'sally']

We can find the position of an item using index(), remember lists start counting at 0



In [28]:

    
print "'sally' is at index:", names.index('sally')









    



'sally' is at index: 5

We can use insert() to put an item at a specific position in the list



In [29]:

    
names.insert(2, 'mike')
names









    Out[29]:





['alice', 'bob', 'mike', 'eve', 'bob', 'bill', 'sally']

pop() can be used for a couple of things, if we simply need to remove the last item from the list we can call it by itself



In [30]:

    
names.pop()
names









    Out[30]:





['alice', 'bob', 'mike', 'eve', 'bob', 'bill']

But, we can also keep that last item in another variable:



In [31]:

    
last_person = names.pop()
print names
print last_person









    



['alice', 'bob', 'mike', 'eve', 'bob']
bill

remove() will remove the 1st occurance of the argument we give it. Notice that alice and mike are now next to each other and the last bob is still in the list



In [32]:

    
names.remove('bob')
names









    Out[32]:





['alice', 'mike', 'eve', 'bob']

reverse() does pretty much what you'd expect it to



In [33]:

    
names.reverse()
names









    Out[33]:





['bob', 'eve', 'mike', 'alice']

As does sort()



In [34]:

    
names.sort()
names









    Out[34]:





['alice', 'bob', 'eve', 'mike']

Interlude: Index notation

Before we move on to our discussion of tuples I'd like to discuss a common way to select items from objects. If we know the index of an item we can select it like this:



In [35]:

    
print names









    



['alice', 'bob', 'eve', 'mike']



In [36]:

    
print names[0]









    



alice

But this will work for other types as well



In [37]:

    
'alice'[3]









    Out[37]:





'c'

Tuples

Tuples are a bit like lists but have some very important differences. First let's take a look at how they are similar:

ordered
series of things seperated by commas
can be of any length
can be a mix of any type of things

It'll be easier to show thier differences through example. First, let's look at a typical tuple:



In [38]:

    
a = (1, 2, 3)
print a
print type(a)









    



(1, 2, 3)
<type 'tuple'>

We normally use parentheses to define a tuple but really any object followed by a comma becomes a tuple. Either way, python will add the parentheses for us anyway:



In [39]:

    
a = "example",
print type(a)
print a









    



<type 'tuple'>
('example',)

Probably the most important difference between a list and a tuple has to do with 'immutability.' Let's take a look at an example:



In [40]:

    
names = ['alice', 'bob']
people = ('alice', 'bob')
print names
print people









    



['alice', 'bob']
('alice', 'bob')

So far, not much difference. But lets say that we wanted to get rid of bob and replace him with eve.



In [41]:

    
names[1] = 'eve'
names









    Out[41]:





['alice', 'eve']



In [42]:

    
people[1] = 'eve'









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-09fc922f2379> in <module>()
----> 1 people[1] = 'eve'

TypeError: 'tuple' object does not support item assignment

Uh-oh python has told us that the tuple does not allow us to 'mute' an item in the tuple the way we can with a list. In other words lists are mutable, tuples are immutable.

Let's see what methods we have available to us for tuples:



In [43]:

    
for i in dir(()):
    if not i.startswith('_'):
        print i









    



count
index

As a result of the immutability of tuples we don't have many built in methods.

Dictionaries

So the data types we've seen so far are great for collections of things but there are times where we have pieces of information that are related in some way and we'd like to keep track of those relationships.

Let's start off with an example:



In [44]:

    
eng_to_spn = {'one': 'uno', 
              'two': 'dos', 
              'three': 'tres'}
eng_to_spn









    Out[44]:





{'one': 'uno', 'three': 'tres', 'two': 'dos'}

Here we have a relationship between pairs of strings, each pair is seperated by a ':' The object to the left of the ':' is called the key and that thing we will use to select a relationship from the dictionary. The object to the right of the ':' is the value.

So we have a relationship that can be described as english numbers : spanish numbers

The whole collection of these pairs is the dictionary. We represent dictionaries in python with curly brackets {}

Let's try picking out some data from the dictionary:



In [45]:

    
eng_to_spn['one']









    Out[45]:





'uno'

Good, when I use a key to select from the dictionary I get the value associated with that key as a response. Let's try another way:



In [46]:

    
eng_to_spn[0]









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-46-00436721313f> in <module>()
----> 1 eng_to_spn[0]

KeyError: 0

An important thing to note about dictionaries is that while they are similar to lists and tuples they are unordered.

This is where the similarity to a real-world dictionary breaks down. We call this data type a dictionary because it keeps track of the relationship between one thing (a word) and some other thing (it's definition). In the real world dictionaries are ordered alphabetically but in python the dictionary data type has no order.

And because of that we can't select the first item using the 0 index like we could with a list or tuple.

As a matter of fact reading from a dict will produce an arbitrary order:



In [47]:

    
for key in eng_to_spn:
    print key









    



three
two
one

Dictionaries are mutable like a list so we can change the relationship of a pair like this:



In [48]:

    
eng_to_spn['one'] = 1
eng_to_spn['two'] = 2
eng_to_spn['three'] = 3
eng_to_spn









    Out[48]:





{'one': 1, 'three': 3, 'two': 2}

We can also add pairs:



In [49]:

    
eng_to_spn['four'] = 4
eng_to_spn









    Out[49]:





{'four': 4, 'one': 1, 'three': 3, 'two': 2}

But for now let's keep the dictionary as a set of english words mapped to thier spanish translations:



In [50]:

    
eng_to_spn = {'one': 'uno', 
              'two': 'dos', 
              'three': 'tres'}
eng_to_spn









    Out[50]:





{'one': 'uno', 'three': 'tres', 'two': 'dos'}

Let's take a look at the methods available to us for dictionaries:



In [51]:

    
for i in dir({}):
    if not i.startswith('_'):
        print i









    



clear
copy
fromkeys
get
has_key
items
iteritems
iterkeys
itervalues
keys
pop
popitem
setdefault
update
values
viewitems
viewkeys
viewvalues

copy() will return a "shallow copy" of the dictionary. I won't get into detail here but if you'd like more information see: http://stackoverflow.com/a/3975388



In [52]:

    
eng_to_spn2 = eng_to_spn.copy()
eng_to_spn2









    Out[52]:





{'one': 'uno', 'three': 'tres', 'two': 'dos'}

fromkeys() will take the keys from one dict and make a new dict with the same keys but with the keys that we specify



In [53]:

    
eng_to_spn3 = eng_to_spn.fromkeys(eng_to_spn, 'english')
eng_to_spn3









    Out[53]:





{'one': 'english', 'three': 'english', 'two': 'english'}

get() will pull the value from a dictionary:



In [54]:

    
eng_to_spn.get('one')









    Out[54]:





'uno'

What's useful about the get() method is that we can specify a default value in the case that what we are asking for doesn't exist yet in the dictionary. This can avoid errors:



In [55]:

    
eng_to_spn['four']









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-55-9bcba1324848> in <module>()
----> 1 eng_to_spn['four']

KeyError: 'four'



In [56]:

    
print eng_to_spn.get('four', None)









    



None

We can also ask if a key exists using has_key():



In [57]:

    
eng_to_spn.has_key('four')









    Out[57]:





False

Although in this example we could have gotten the same result by doing the following:



In [58]:

    
'four' in eng_to_spn









    Out[58]:





False

Even though there are methods available to us (the ones we can see with dir()) there may be built-in tools of the language that may be a better choice.

We can get the pairs as a list of tuples using the items() method:



In [59]:

    
eng_to_spn.items()









    Out[59]:





[('three', 'tres'), ('two', 'dos'), ('one', 'uno')]

iteritems() gives an item that we can call .next() on. This is valuable in the case that don't want to load the entire dictionary into memory but still want to iterate through the items.



In [60]:

    
items = eng_to_spn.iteritems()
print items.next()
print items.next()









    



('three', 'tres')
('two', 'dos')

We can do the same with the keys using iterkeys():



In [61]:

    
keys = eng_to_spn.iterkeys()
print keys.next()
print keys.next()









    



three
two

We can remove a key and return the value using pop()



In [62]:

    
three = eng_to_spn.pop('three')
print three
eng_to_spn









    



tres






    Out[62]:





{'one': 'uno', 'two': 'dos'}

popitem() will remove a associaton and return it as a tuple but you don't get to pick which item you'd like to pop out!



In [63]:

    
anything = eng_to_spn.popitem()
print anything
eng_to_spn









    



('two', 'dos')






    Out[63]:





{'one': 'uno'}

setdefault() works a bit like get() but will set the value for us if it doesn't exist in the dictionary:



In [64]:

    
eng_to_spn.setdefault('four', 'quatro')
eng_to_spn









    Out[64]:





{'four': 'quatro', 'one': 'uno'}

update() allows us to add values from another dictionary:



In [65]:

    
new_numbers = {'five': 'cinco', 'six': 'seis'}
eng_to_spn.update(new_numbers)
eng_to_spn









    Out[65]:





{'five': 'cinco', 'four': 'quatro', 'one': 'uno', 'six': 'seis'}

We can see all the values from a dictionary using values()



In [66]:

    
eng_to_spn.values()









    Out[66]:





['quatro', 'cinco', 'seis', 'uno']

These next methods, viewitems(), viewkeys() and viewvalues() each return a dictionary view object. The python docs discuss thier purpose: https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects

The objects returned by dict.viewkeys(), dict.viewvalues() and dict.viewitems() are view objects. They provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes, the view reflects these changes. Dictionary views can be iterated over to yield their respective data, and support membership tests:



In [67]:

    
eng_to_spn.viewitems()









    Out[67]:





dict_items([('four', 'quatro'), ('five', 'cinco'), ('six', 'seis'), ('one', 'uno')])



In [68]:

    
eng_to_spn.viewkeys()









    Out[68]:





dict_keys(['four', 'five', 'six', 'one'])



In [69]:

    
eng_to_spn.viewvalues()









    Out[69]:





dict_values(['quatro', 'cinco', 'seis', 'uno'])

We skipped over clear() but here's a good time to see what it does, clear the dictionary out!



In [70]:

    
eng_to_spn.clear()
eng_to_spn









    Out[70]:





{}



In [ ]: